Limitations of Cross-Lingual Learning from Image Search
نویسندگان
چکیده
Cross-lingual representation learning is an important step in making NLP scale to all the world’s languages. Recent work on bilingual lexicon induction suggests that it is possible to learn cross-lingual representations of words based on similarities between images associated with these words. However, that work focused on the translation of selected nouns only. In our work, we investigate whether the meaning of other parts-of-speech, in particular adjectives and verbs, can be learned in the same way. We also experiment with combining the representations learned from visual data with embeddings learned from textual data. Our experiments across five language pairs indicate that previous work does not scale to the problem of learning cross-lingual representations beyond simple nouns.
منابع مشابه
Multilingual Plagiarism Detection
Cross lingual plagiarism detection has recently caught attention due to copy-right violations occurring in many fields such as education, journalism, scientific research, literature, screenplays, etc, where an author would translate an article in language L1 into language L2 and then either publish/submit it or change some of the sentences to suit his/her motivations. Therefore, the need for a ...
متن کاملCross-Lingual Image Search on the Web
Most people locate images on the Web by querying image search engines such as Google’s. The images are tagged by the words in their “vicinity”, which limits the ability of a searcher to retrieve them. Although images are universal, an English searcher will fail to find images tagged in Chinese, and a Spanish searcher will fail to find images tagged in English. Cross-lingual homonyms cause probl...
متن کاملSemi-Supervised Representation Learning for Cross-Lingual Text Classification
Cross-lingual adaptation aims to learn a prediction model in a label-scarce target language by exploiting labeled data from a labelrich source language. An effective crosslingual adaptation system can substantially reduce the manual annotation effort required in many natural language processing tasks. In this paper, we propose a new cross-lingual adaptation approach for document classification ...
متن کاملImage-Mediated Learning for Zero-Shot Cross-Lingual Document Retrieval
We propose an image-mediated learning approach for cross-lingual document retrieval where no or only a few parallel corpora are available. Using the images in image-text documents of each language as the hub, we derive a common semantic subspace bridging two languages by means of generalized canonical correlation analysis. For the purpose of evaluation, we create and release a new document data...
متن کاملLearning Cross-lingual Word Embeddings via Matrix Co-factorization
A joint-space model for cross-lingual distributed representations generalizes language-invariant semantic features. In this paper, we present a matrix cofactorization framework for learning cross-lingual word embeddings. We explicitly define monolingual training objectives in the form of matrix decomposition, and induce cross-lingual constraints for simultaneously factorizing monolingual matric...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1709.05914 شماره
صفحات -
تاریخ انتشار 2017